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ABSTRACT 



Signal detection theory (SDT) has been widely applied in 
situations where observers attempt to detect or discriminate between two or 
more events. The usefulness of SDT with latent classes was illustrated in the 
context of an educational situation that can be readily conceptualized as a 
signal detection task: grading term papers. The approach assumes that the 
graders attempt to discriminate between latent classes of papers by using a 
decision criteria in combination with their perceptions of the quality of 
each paper. Three graders (a professor and two graduate assistants) graded 85 
term papers from a graduate course on measurement. A fit of the latent class 
signal detection model indicates that the graders discriminate equally 
between two latent classes, but their response criteria differ. These are 
similar to results typically found in signal detection experiments with 
observed events. The findings show that SDT offers a simple summary of the 
graders' performance in terms of their ability to discriminate between the 
latent classes and their arbitrary use of grade categories. (Contains 1 
figure, 2 tables, and 11 references.) (SLD) 
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Signal detection theory (SDT) has been widely applied in situations where observers 
attempt to detect or discriminate between two or more events (see Macmillan & Creelman, 
1991). It has played an important role in memory research in psychology, for example, in part 
because it provides a measure of memory that is separate from arbitrary response effects. In this 
type of application, it is known whether or not an event actually occurred (e.g., whether or not a 
word was previously presented during a study period). In other situations, however, the task is 
again one of signal detection, but the event is not observed. An example is attempting to 
determine whether or not a person has a psychological or physical condition, such as depression 
or disease, where the true state of the person is not known. In this case, the psychological theory 
is the same (i.e., SDT), with the only difference being that the events of interest are latent. 

Signal detection theory can readily be applied to this type of situation by incorporating it 
into a latent class analysis (Dayton, 1998; McCutcheon, 1987). As shown below, latent class 
signal detection models are simply generalized linear models with latent categorical predictors 
(one or more signals versus noise; see Figure 1); they are closely related to located latent class 
models (e.g., Formann, 1985; Uebersax, 1993) and to discretized latent trait models (Clogg, 
1988; Heinen, 1996), but they differ with respect to parameterization and perspective. For 
example, the latent classes are viewed in signal detection as being qualitative, and not as arising 
from the discretization of a continuous latent variable. 

The utility of SDT with latent classes is illustrated in the context of an educational 
situation that can readily be conceptualized as a signal detection task: grading term papers. The 
approach assumes that the graders attempt to discriminate between latent classes of papers by 
using a decision criteria in combination with their perception of the quality of each paper. It is 
shown that SDT offers a simple summary of the graders performance in terms of their ability to 
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discriminate between the latent classes and their arbitrary use of grade categories. The approach 
also provides measures of the reliability of the graders individually and as a set. Some evidence 
as to the validity of the latent classes, namely their relation to students’ average grade on two 
course exams, is also presented. 

Consider the situation where j independent observers examine stimuli and make decisions 
as to which of C events are present; the discussion here focuses on the basic situation with two 
events (signal and noise), but the extension to three or more events is straightforward. A general 
signal detection model for binary or rating responses and two events is 

p(Yf k\X) = F(c^-djX), 

where K is the number of response categories, l^k^K-l,Xisa dummy coded variable that 
indicates the two events, p(Yj^k|X) is the cumulative probability of response k by observer] 
conditional on X, Cj k is the distance of the kth response criterion from the mode of the reference 
distribution for the jth observer, dj are the distances between the two underlying distributions for 
the jth observer, and F is a cumulative distribution function (CDF) for the underlying 
distributions. The inverse of F corresponds to a link function g, with common choices being the 
logit, inverse normal, and complementary log log links, which give signal detection models 
based on logistic, normal, and extreme value distributions, respectively (DeCarlo, 1998). 

To extend the model to the situation where the events are latent, the observed categorical 
variable X is replaced by a latent categorical variable, say Xc, with c = 1,2. The model can be 
incorporated into a restricted latent class model by using differences between the cumulative 
probabilities, 



*W> - *V W 

p(Y--k\X c ) - F(c jk ~ djXJ - Fic^-df') 
p(Y r k\X c ) - 1 -F( Vr i/ t ) 



k = 1 



k = K, 
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for the conditional probabilities of a latent class model, which for three observers and two latent 



classes can be written as 



2 




( 2 ) 



where £ c p(Yj | Xc) = 1 for each observer, and £ c p OO = 1 . The above follows from the 
assumptions that there are two mutually exclusive and exhaustive latent classes and the j. 
observers are independent. 

Equations 1 and 2 offer a general class of signal detection models with latent classes that 
can be used in situations that can be conceptualized in terms of SDT, such as when observers 
attempt to detect or discriminate latent categorical events. The model can be fit using software 
for latent class analysis that allows one to restrict the conditional probabilities using different 
cumulative link functions, such as LEM (Vermunt, 1997). 

Methods 

Three graders (professor and two graduate assistants) graded 85 term papers from a 
graduate course on measurement. The papers were graded on a scale from 1 -4, with the graders 
instructed to consider a below average paper as 1, an average paper as 2, an above average paper 
as 3, and an excellent paper as 4. Graders were instructed to first read five or six papers, chosen 
at random, before grading any of the papers, to obtain an idea of what the average paper might be 
like. 
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Results 



Table 1 shows, for latent class logistic signal detection models with from one to four 
latent classes, information based goodness of fit indices, namely the Bayesian information 
criterion (BIC) and Akaike’s information criterion (AIC) (see Agresti, 1990). The criteria can be 
used to compare nested and non-nested models, with smaller values indicating a better model. 
The eigenvalues of the information matrix did not indicate identification problems for the two or 
three class models, but there were near zero values for four or more classes. Different runs with 
different starting values resulted in recovery of the parameter estimates for the two and three 
class models. 

The values of both the BIC and AIC are smallest for the model with two latent classes. 
Thus, the results suggest that the graders can discriminate between two latent classes (e.g., grades 
of A and B). Goodness of fit statistics for the two class model are X 2 = 25.97, df=50, g=.998 for 
the chi-square statistic and L 2 =30.12, df=50, p=.988 for the likelihood ratio statistic, both of 
which suggest acceptable fit. 

The top part of Table 2 shows the parameter estimates and standard errors for the model 
with two latent classes. The estimated sizes of the latent classes are .46 and .54 for classes 1 and 
2, respectively. Inspection of the estimated conditional probabilities (not shown) shows that 
latent Class 1 represents a lower latent class and Class 2 a higher latent class. The detection 
parameters are close in magnitude (that for observer 1 is higher, but the standard error is large), 
indicating that the graders discriminate equally. A likelihood ratio test of a restricted model with 
detection parameters equal across the three observers gives LR =1.22, df=2, g=.54, so the 
restricted model is not rejected; the values of BIC and AIC are also both smaller than those for 
the unrestricted model. The lower half of Table 2 shows the parameter estimates for the restricted 



model. The estimate of d is 2.36, so the odds of a higher response are exp(2.36)=10.6 times 
higher for class 2 than for class 1 , which is comparable to detection found in memory and 
psychophysics experiments. The table also shows that the standard errors for the restricted model 
tend to be considerably smaller. A correlation-like conditional measure of reliability, Yule’s Q, 
can be obtained from d as [exp(d)- l])/[exp(d)+l], which in this case gives .83. Lambda, the 
relative reduction in prediction error, provides a measure of the reliability of the observers as a 
set (see Clogg & Manning, 1996), and in this case its estimate is .71. 

The estimates of the response criteria suggest that the three graders differ, and a 
likelihood ratio test of the restriction of equal criteria across the graders leads to rejection of the 
restriction. The main difference, as can be seen in Table 2, is that grader B had a higher criteria 
for a grade of 2 than the other two graders. Since the graders were instructed to consider 2 as 
average, this suggests that grader B had a stricter view as to what average is 

Each paper can be classified into one of the latent classes using the modal posterior 
probability, that is, pfXJYuY^Y-t). Evidence as to the validity of the classification is given by a 
comparison of the average score on two course exams across the latent classes; the mean was 
76.4 for Class 1 (the lower class) and 81.5 for Class 2, with the difference being significant 
(t=2.6, df=83, g=.012). Thus, students in the higher latent class had an average score on two 
course exams that was about five points higher. Note that if one wishes to assign finer ordinal 
grades to individuals (e.g., A, A-, B+, B), this can be done using the modal posterior 
probabilities by grouping the probabilities into categories. This is consistent with Clogg’s (1988; 
also see Uebersax, 1993) suggestion to use the product of the posterior probabilities and values 
assigned to the latent classes in order to assign scores to individuals. The difference in this case is 
that the latent classes are treated as purely categorical, so the values assigned to the latent classes 



are simply zero and one (in which case Clogg’s suggested scoring system simply uses the 
posterior probabilities as scores). 

In sum, a fit of the latent class signal detection model indicates that the graders 
discriminate equally between two latent classes, but their response criteria differ; these are 
similar to results typically found in signal detection experiments with observed events. The 
magnitude of d and the measures of conditional reliability indicate good discrimination; the 
latent classes also differed with respect to average exam grade, which provides evidence as to 
validity. 

Conclusion 

Paper and essay grading has been studied from several perspectives, such as that offered 
by the Rasch model and by item response theory. The approach via SDT provides a somewhat 
different perspective. For one, the latent classes are viewed as being categorical, and not as 
arising from a discretization of a latent trait. The result is that measurement in this case is 
qualitative. Second, the discrimination parameter in SDT is viewed as a fixed characteristic of 
the observer, whereas the response criteria are not; in item response theory the discrimination and 
item difficulty (rater severity) parameters are both considered fixed. The view via SDT also 
suggests that a large body of research and theory in experimental psychology is relevant to paper 
and essay grading, and it suggests new research, such as attempting to manipulate the graders’ 
response criteria across sessions to see if their discrimination remains constant, as found in 
classic experiments in SDT with observable events. This would provide an important 
experimental validation of the model and theory. 
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Table 1 



Information Criteria for Latent Class Signal Detection Models 



# of Classes 


BIC 


AIC 


1 


675.81 


653.83 


2 


671.20 


639.45 


3 


687.11 


645.59 


4 


697.32 


646.02 



Notes: BIC = Bayesian information criterion, AIC = Akaike’s information criterion. 
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Table 2 



Parameter Estimates and Standard Errors for Latent Class Signal Detection Model with Two 
Classes 



4 

Observer A 3.59 (1.60) 

Observer B 2.09 (0.64) 

Observer C 2.04 (0.68) 

Equal Detection: 


Cjl 

-0.88 (0.45) 
0.46 (0.44) 
-1.41 (0.44) 


—ft 

2.93(1.48) 
2.20 (0.59) 
0.96 (0.51) 


Sj3 

4.55 (1.59) 
4.30 (0.74) 
3.23 (0.68) 


P(X.) 

.46 




4 


Cjl 


—ft 


Cj3 


P(X.) 


Observer A 


2.36 (0.37) 


-1.04 (0.44) 


1.96 (0.55) 


3.42 (0.56) 


.47 


Observer B 


2.36 (0.37) 


0.55 (0.51) 


2.39 (0.55) 


4.52 (0.64) 




Observer C 


2.36 (0.37) 


-1.36 (0.47) 


1.11 (0.54) 


3.49(0.56) 
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